skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Edelman, Benjamin"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Large language models have the ability to generate text that mimics patterns in their inputs. We introduce a simple Markov Chain sequence modeling task in order to study how this in-context learning (ICL) capability emerges. In our setting, each example is sampled from a Markov chain drawn from a prior distribution over Markov chains. Transformers trained on this task form \emph{statistical induction heads} which compute accurate next-token probabilities given the bigram statistics of the context. During the course of training, models pass through multiple phases: after an initial stage in which predictions are uniform, they learn to sub-optimally predict using in-context single-token statistics (unigrams); then, there is a rapid phase transition to the correct in-context bigram solution. We conduct an empirical and theoretical investigation of this multi-phase process, showing how successful learning results from the interaction between the transformer's layers, and uncovering evidence that the presence of the simpler unigram solution may delay formation of the final bigram solution. We examine how learning is affected by varying the prior distribution over Markov chains, and consider the generalization of our in-context learning of Markov chains (ICL-MC) task to n-grams for n is greater than 2. 
    more » « less
  2. In many predictive decision-making scenarios, such as credit scoring and academic testing, a decision-maker must construct a model that accounts for agents' incentives to ``game'' their features in order to receive better decisions. Whereas the strategic classification literature generally assumes that agents' outcomes are not causally dependent on their features (and thus strategic behavior is a form of lying), we join concurrent work in modeling agents' outcomes as a function of their changeable attributes. Our formulation is the first to incorporate a crucial phenomenon: when agents act to change observable features, they may as a side effect perturb unobserved features that causally affect their true outcomes. We consider three distinct desiderata for a decision-maker's model: accurately predicting agents' post-gaming outcomes (accuracy), incentivizing agents to improve these outcomes (improvement), and, in the linear setting, estimating the visible coefficients of the true causal model (causal precision). As our main contribution, we provide the first algorithms for learning accuracy-optimizing, improvement-optimizing, and causal-precision-optimizing linear regression models directly from data, without prior knowledge of agents' possible actions. These algorithms circumvent the hardness result of Miller et al. (2019) by allowing the decision maker to observe agents' responses to a sequence of decision rules, in effect inducing agents to perform causal interventions for free. 
    more » « less
  3. null; null; null; null; null; null (Ed.)